Reorganize global project (macros) #4154

jtcohen6 · 2021-10-28T09:09:42Z

Picking up from #3858. I got the itch for some autumnal cleaning. It's just too hard today to find the macros I'm looking for, or to even know where they might be. I'd really like this to be better for v1.

Dream scenario: We also document (description + args) the most important global macros, especially ones that users are likely to override/reimplement. That can come after rc1, and it can be a team effort :)

Trade-off:

Much easier to find these macros in the future
Much harder to use git blame to understand when + why they've changed in the past (but they don't change so so often, and usually not one critical line at a time, save for pesky syntax errors)

Also:

dispatch more macros
move generic tests (just the test part) into tests/generic, following Er/3250 generic tests #4052, with a comment explaining why most of the actual SQL lives separately — do we like this?

No intention for functional change in this PR. Very open to feedback!

Checklist

I have signed the CLA
I have run this code in development and it appears to resolve the stated issue
~~This PR includes tests, or tests are not required/relevant for this PR~~ — existing tests all passing (🤞)
I have updated the CHANGELOG.md and added information about my change

dataders

thanks @jtcohen6! I can tell you put a lot of thought into this.

I definitely support this change and that it ships as part of 1.0. I'd like to believe these macros will be more stable in 1.0 so the git history is not as important.

Random thought, but with this new macro taxonomy, I'm imagining just looking at an adapters include/ file tree, would give you the ways that an adapter differs from the core/postgres implementation.

questions

would now (or at least pre-1.0) be a good time to rename adapter macros based on whether they return sql strings or actually execute SQL (get_x_sql vs alter_x_do_y) (previous discussion in dbt Slack)
what's the ideal ordering of macros?
- alphabetically?
- most used to least used?
- those without default__ versions first?
related to above, anyway we can sandbox the macros that are required to be implemented for a custom adapter? or maybe we update the scaffolding to give the filetree and files for macros they have to implement? I guess I'd be interested to see what an adapter's filetree would look like in this new model...

dataders · 2021-11-01T22:55:20Z

core/dbt/include/global_project/docs/overview.md

@@ -38,6 +38,6 @@ Note that you can also right-click on models to interactively filter and explore
 - [What is dbt](https://docs.getdbt.com/docs/overview)?
 - Read the [dbt viewpoint](https://docs.getdbt.com/docs/viewpoint)
 - [Installation](https://docs.getdbt.com/docs/installation)
- Join the [chat](https://community.getdbt.com/) on Slack for live questions and support.
+- Join the [dbt Community](https://www.getdbt.com/community/) for questions and discussion


side note: do we need these auto-generated docs here even?

This is what's included by default in the dbt-docs site Overview, until/unless users provide a custom overview. It could be simpler / have less info in it — is that what you're thinking?

sure. but does the global_project need it? global_project doesn't show up in anyones dbt docs, nor are any of the macros documented? I guess this file is just present in all scaffolded dbt projects?

That's right - this is how it gets into users' project, by default.

It's an opinionated choice to exclude internal/global macros from the dbt-docs site; they're included in manifest.json, just like every other macro. I'd like to document global macros (perhaps a community-sourced effort??), which we can use to power other cool metadata-y things down the line

dataders · 2021-11-01T22:57:11Z

core/dbt/include/global_project/macros/adapters/columns.sql

+{% macro get_columns_in_relation(relation) -%}
+  {{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }}
+{% endmacro %}
+
+{% macro sql_convert_columns_in_relation(table) -%}
+  {% set columns = [] %}
+  {% for row in table %}
+    {% do columns.append(api.Column(*row)) %}
+  {% endfor %}
+  {{ return(columns) }}
+{% endmacro %}


suggest stronger conformity to the pattern of the default__ macro directly following the parent macro

Suggested change

{% macro get_columns_in_relation(relation) -%}

{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }}

{% endmacro %}

{% macro sql_convert_columns_in_relation(table) -%}

{% set columns = [] %}

{% for row in table %}

{% do columns.append(api.Column(*row)) %}

{% endfor %}

{{ return(columns) }}

{% endmacro %}

{% macro sql_convert_columns_in_relation(table) -%}

{% set columns = [] %}

{% for row in table %}

{% do columns.append(api.Column(*row)) %}

{% endfor %}

{{ return(columns) }}

{% endmacro %}

{% macro get_columns_in_relation(relation) -%}

{{ return(adapter.dispatch('get_columns_in_relation', 'dbt')(relation)) }}

{% endmacro %}

dataders · 2021-11-01T23:04:09Z

core/dbt/include/global_project/macros/adapters/relation.sql

+{# here for backwards compatibility #}
+{% macro drop_relation_if_exists(relation) %}
+  {% if relation is not none %}
+    {{ adapter.drop_relation(relation) }}
+  {% endif %}
+{% endmacro %}
+
+{# here for backwards compatibility #}
+{% macro load_relation(relation) %}
+  {% do return(adapter.get_relation(
+    database=relation.database,
+    schema=relation.schema,
+    identifier=relation.identifier
+  )) -%}
+{% endmacro %}


in a post 1.0 world, do we need to keep these?

These are definitely used in other plugins today. My preference would be to:

call out that they're just pass-throughs for other macros (via these comments)

avoid calling them in any global materializations/macros

update our (Labs-maintained) plugins to avoid using them

not to remove them entirely — I don't see a ton of harm in keeping them around, and it's one more bump in the road for plugins/projects upgrading

In retrospect, load_relation actually offers a real convenience. I'm going to add it back

dataders · 2021-11-01T23:07:00Z

core/dbt/include/global_project/macros/etc/column_helpers.sql

+{% macro column_list(columns) %}
+  {%- for col in columns %}
+    {{ col.name }} {% if not loop.last %},{% endif %}
+  {% endfor -%}
+{% endmacro %}


perhaps justifying this PR further, but this is the first time I've seen this macro before... ah! /global_project/macros/materializations/helpers.sql! guess i never looked there before... now I'm racking my brain for instances of this macro not being used where it should be.

Honestly, we've probably reinvented this wheel a dozen+ times. I didn't realize it existed until I did the work for this PR either. We have some utilities in core that we've used for various materializations, but some of them aren't even being used anymore. At this point, we could delete them entirely, or to keep them around forever—either feels like a valid approach.

I don't see column_list or column_list_for_create_table being used in any of our global or plugin macros/materializations, so I think I'm just going to snip these out

dataders · 2021-11-01T23:09:48Z

core/dbt/include/global_project/macros/etc/configs.sql

+{% macro should_store_failures() %}
+  {% set config_store_failures = config.get('store_failures') %}
+  {% if config_store_failures is none %}
+    {% set config_store_failures = flags.STORE_FAILURES %}
+  {% endif %}
+  {% do return(config_store_failures) %}
+{% endmacro %}


is this used outside of the test materialization? for me this configs.sql is the once file i feel iffy about existing, but I see where you're coming from.

To me, should_store_failures() feels very closely related to should_full_refresh(). They have almost identical logic. The should_full_refresh() macro is used by both the incremental model materialization + seeds. Do you think it would make more sense to move configs.sql to macros/materializations/ instead of macros/etc?

I've decided, yes, it would make more sense to have these in macros/materializations/configs.sql

dataders · 2021-11-01T23:13:23Z

core/dbt/include/global_project/macros/etc/column_helpers.sql

+{% macro diff_columns(source_columns, target_columns) %}
+
+  {% set result = [] %}
+  {% set source_names = source_columns | map(attribute = 'column') | list %}
+  {% set target_names = target_columns | map(attribute = 'column') | list %}
+
+   {# --check whether the name attribute exists in the target - this does not perform a data type check #}
+   {% for sc in source_columns %}
+     {% if sc.name not in target_names %}
+        {{ result.append(sc) }}
+     {% endif %}
+   {% endfor %}
+
+  {{ return(result) }}
+
+{% endmacro %}


curious (but no strong opinion), on the home for helper macros that are only currently used for a specific materialization... I like the categorization but maybe unnecessary complexity to go to another file when no other macros are using it?

Yeah, good thought. As above, the question for me is:

Should we consider these "private" methods that exist for internal (global materialization) use only, not to be documented, and used at users' own risk

Should dbt-core's global project become more of a global repository of handy, nice-to-have helper macros like this one

Somewhere in between: these are mostly internal methods, they're useful for adapter plugin maintainers when extending dbt-core to new databases, but they shouldn't be called / relied on any old project

When I put it like that, I actually find myself leaning toward 2 or 3, rather than 1. Which is to say, they should probably live in a file like materializations/internal_helpers

dataders · 2021-11-01T23:14:56Z

core/dbt/include/global_project/macros/materializations/models/table/create_table.sql

+{% macro create_table_as(temporary, relation, sql) -%}
+  {{ adapter.dispatch('create_table_as', 'dbt')(temporary, relation, sql) }}
+{%- endmacro %}
+
+{% macro default__create_table_as(temporary, relation, sql) -%}
+  {%- set sql_header = config.get('sql_header', none) -%}
+
+  {{ sql_header if sql_header is not none }}
+
+  create {% if temporary: -%}temporary{%- endif %} table
+    {{ relation.include(database=(not temporary), schema=(not temporary)) }}
+  as (
+    {{ sql }}
+  );
+
+{% endmacro %}


love that these macros be broken into their own files! nit: can we make it create_table_as.sql? and do the same thing for create_view_as macro (create_view_as.sql)?

yes, absolutely!

dataders · 2021-11-01T23:17:23Z

test/integration/029_docs_generate_tests/test_docs_generate.py

@@ -574,7 +574,7 @@ def _verify_generic_macro_structure(self, manifest):
            if k not in {'macro_sql'}
        }
        # Windows means we can't hard-code these.
-        helpers_path = Normalized('macros/materializations/helpers.sql')
+        helpers_path = Normalized('macros/etc/column_helpers.sql')


this scares me. should I be scared?

No cause for fear. The comment a few lines up:

# just test a known global macro to avoid having to update this every # time they change.

We want to verify that global/internal macros are included in manifest.json, and to do that we just test that one of them exists, with all of its properties, including its path.

jtcohen6 · 2021-11-03T11:14:42Z

@swanderz Thank you for the detailed + thoughtful feedback!!

Questions:

would now (or at least pre-1.0) be a good time to rename adapter macros based on whether they return sql strings or actually execute SQL (get_x_sql vs alter_x_do_y)?

Yes. I think create_table_as and create_view_as are the biggest apostates from this convention. I just updated them to be get_create_table_as_sql and get_create_view_as_sql by default, with the old macros serving as passthroughs for backwards compatibility.

what's the ideal ordering of macros? alphabetically? most used to least used? those without default__ versions first?

Good question! My informal guidelines have been:

We should split up macros into enough files that they are grouped more meaningfully, and so that you don't need to scroll very much or at all (<100 lines)
Materializations should always be alone in their own files
Within each (much smaller) file, macros should proceed in as "logical" of an order as we can muster. The dispatched macro should precede the default__ implementation. If macro_a is called by macro_b, macro_a should precede macro_b.
Beyond that, I don't have super-strong feelings. I think this reorganization is a very partial step on the way toward real macro discovery/discoverability, and we'll need other tools to help.

related to above, anyway we can sandbox the macros that are required to be implemented for a custom adapter? or maybe we update the scaffolding to give the filetree and files for macros they have to implement? I guess I'd be interested to see what an adapter's filetree would look like in this new model...

Yes! All of these macros are sandboxed in global_project/macros/adapters. These are the macros we expect an adapter plugin will need or want to override, since it's behavior that differs across databases.

Right now, I have the macros grouped by function (columns vs. freshness vs. persist_docs), but we could instead group them by which ones have default implementations vs. which must be implemented for a new custom adapter. I think my preference is, as you say, to update the adapter scaffolding to use the same functional file tree when they override/reimplement, rather than one huge file named adapters.sql.

Of course, some adapters will need to reimplement the core materializations in addition to the macros in the adapters/ subdirectory, but we should aspire to evade that necessity as often as we can.

To your earlier comment, all of these tweaks together will enable someone to quickly glance at an adapter's include/<adapter>/macros directory to see how much it differs from core/postgres/default.

* Use the dbt Events Module * Upgrade dbt-core to 1.0.0 * Add a profile_template.yml file * Reorganize macros following dbt-labs/dbt-core#4154 * Use MANIFEST.in instead of package_data see: https://stackoverflow.com/questions/7522250/how-to-include-package-data-with-setuptools-distutils * Do not quote seed columns see dbt-labs/dbt-core@64fc3a3 * Upgrade to 1.0.1

* Add integration tests * Reorganize + dispatch more global macros * Reorg materializations subdir * Move around + document generic tests * Fix failing tests * Fix merge conflict * Grab fix from #4148 * PR feedback * Fixup * Add load_relation back, it was nice * Last few test fixes * Rm incremental_upsert, now unused * Add changelog entry automatic commit by git-black, original commits: 8442fb6

cla-bot bot added the cla:yes label Oct 28, 2021

jtcohen6 marked this pull request as ready for review October 28, 2021 13:04

dataders requested changes Nov 1, 2021

View reviewed changes

jtcohen6 requested a review from leahwicz November 3, 2021 11:54

jtcohen6 self-assigned this Nov 7, 2021

jtcohen6 and others added 11 commits November 7, 2021 19:12

Add integration tests

e79c7d5

Reorganize + dispatch more global macros

5c80d6e

Reorg materializations subdir

48b8d53

Move around + document generic tests

67523e6

Fix failing tests

838a002

Fix merge conflict

8a6c7a7

Grab fix from #4148

9185c58

PR feedback

f4a3fed

Fixup

b03f4f4

Add load_relation back, it was nice

5a2ea5a

Last few test fixes

ec98984

jtcohen6 force-pushed the reorg/global-macros branch from 9d9cfaf to ec98984 Compare November 7, 2021 18:14

jtcohen6 added 2 commits November 7, 2021 19:16

Rm incremental_upsert, now unused

99b4073

Add changelog entry

171b3cc

leahwicz approved these changes Nov 8, 2021

View reviewed changes

jtcohen6 merged commit 8442fb6 into main Nov 8, 2021

jtcohen6 deleted the reorg/global-macros branch November 8, 2021 18:09

jtcohen6 mentioned this pull request Nov 8, 2021

Wrap get_batch_size() in return() #4240

Merged

dataders mentioned this pull request Nov 24, 2021

align to global_project's dir structure dbt-msft/dbt-sqlserver#184

Merged

courentin mentioned this pull request Dec 19, 2021

Support dbt 1.0 Tomme/dbt-athena#51

Closed

5 tasks

courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 20, 2021

Reorganize macros following dbt-labs/dbt-core#4154

fa041bd

courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021

Reorganize macros following dbt-labs/dbt-core#4154

d7136e2

courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021

Reorganize macros following dbt-labs/dbt-core#4154

6bc9b0d

courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021

Reorganize macros following dbt-labs/dbt-core#4154

cceb32a

courentin added a commit to courentin/dbt-athena that referenced this pull request Dec 21, 2021

Reorganize macros following dbt-labs/dbt-core#4154

2a16610

runleonarun mentioned this pull request Feb 10, 2022

Support for multiple unique_key columns dbt-labs/docs.getdbt.com#1105

Closed

1 task

jtcohen6 mentioned this pull request May 9, 2022

Tidying up materialization logic: relation names, cache loading, etc #5221

Merged

6 tasks

jtcohen6 mentioned this pull request Jul 18, 2022

[CT-839] Unable to override exising macro #5467

Closed

1 task

jtcohen6 mentioned this pull request Nov 3, 2022

draft for support python in base mat macros #6127

Closed

6 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reorganize global project (macros) #4154

Reorganize global project (macros) #4154

jtcohen6 commented Oct 28, 2021 •

edited

Loading

dataders left a comment

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021

dataders Nov 3, 2021

jtcohen6 Nov 4, 2021

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021 •

edited

Loading

jtcohen6 Nov 3, 2021

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021

jtcohen6 Nov 3, 2021

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021

jtcohen6 Nov 3, 2021

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021

dataders Nov 1, 2021

jtcohen6 Nov 3, 2021

jtcohen6 commented Nov 3, 2021 •

edited

Loading

Reorganize global project (macros) #4154

Reorganize global project (macros) #4154

Conversation

jtcohen6 commented Oct 28, 2021 • edited Loading

Checklist

dataders left a comment

Choose a reason for hiding this comment

questions

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 Nov 3, 2021 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

jtcohen6 commented Nov 3, 2021 • edited Loading

jtcohen6 commented Oct 28, 2021 •

edited

Loading

jtcohen6 Nov 3, 2021 •

edited

Loading

jtcohen6 commented Nov 3, 2021 •

edited

Loading